Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4475: Fix NodeChildrenChanged delivered to recursive watcher #1820

Conversation

kezhuw
Copy link
Member

@kezhuw kezhuw commented Feb 22, 2022

Cause:

  • Events are delivered to recursive watcher unconditionally.
  • Client could receive NodeChildrenChanged due to child watcher on recursive watching node's descendants.

Changes: Filter out NodeChildrenChanged events for recursive watcher in client side.

@kezhuw
Copy link
Member Author

kezhuw commented Feb 25, 2022

Ping @eolivelli @Randgalt for review.

@kezhuw
Copy link
Member Author

kezhuw commented Mar 9, 2022

Another ping @eolivelli @Randgalt @symat @maoling for review.

@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from 4751fab to de934cc Compare April 2, 2022 12:12
@kezhuw
Copy link
Member Author

kezhuw commented Apr 6, 2022

@eolivelli @Randgalt @symat @maoling @arshadmohammad @anmolnar Anyone have time capacity to review this pr ? I think @Randgalt has confirmed this as bug in ZOOKEEPER-4466.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kezhuw Thanks for your contribution!

Generally looks good. Comments inline.

zk.create("/a/b/c", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);

assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a/b");
assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a/b/c");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may assert no outstanding event here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assertTrue(events.isEmpty()) to assert that no more events in queue.

synchronized (persistentWatches) {
addTo(persistentWatches.get(clientPath), result);
}
if (type == Watcher.Event.EventType.NodeChildrenChanged) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about reason to filter NodeChildrenChanged event helps - it's not quite straightforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to explain why this is necessary and why this must be done in client side.

@tisonkun
Copy link
Member

Some other concerns:

  1. The description in this patch is different from JIRA's content:

This requires server to never deliver NodeChildrenChanged for node's descendants.

In this patch we filter event on the client side, which has nothing to do with server side logics.

  1. Do you have a real world workload needs this patch? I'm glad to see a concrete use case for PersistentRecursiveWatch and how the behavior changed here affects the application logic.

@kezhuw
Copy link
Member Author

kezhuw commented Jun 29, 2022

@tisonkun Thank you for reviews. I have pushed fixup commit to resolve your comments. Regarding your concerns, I try to reply them here:

  1. The next sentence after the quoted one is "It can not be true.". From server's perspective, it is valid to attach standard child watches on descendants of node which is being watched in persistent recursive mode. So, basically, server is innocent to this. It has to be solved in client side. I explains this in code comments in fixup commit also.
  2. No. I observed this in writing tests for zookeeper-client-rust. In the first place, I observed quirk behaviors in situations where standard watches and persistent watches are attached on overlapping paths(ZOOKEEPER-4466). After discussion with @Randgalt in ZOOKEEPER-4466, I split out ZOOKEEPER-4475 as a bug report and ZOOKEEPER-4466 remains more like a feature request.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from f7bb53e to 1c5e5ee Compare June 29, 2022 14:31
@eolivelli
Copy link
Contributor

@tisonkun would you have time to test if Curator still works well with this change ? like running all the tests of Curator

@eolivelli
Copy link
Contributor

I'm glad to see a concrete use case for PersistentRecursiveWatch
@tisonkun for instance Apache Pulsar relies heavily on PersistentRecursiveWatch since version 2.8.0
you can take a look to Pulsar code

@tisonkun
Copy link
Member

@eolivelli I'll try to do a local test this week. Ping me if I miss it :)

@tisonkun
Copy link
Member

tisonkun commented Jun 30, 2022

@eolivelli since Curator doesn't support ZK 3.7+, I'm trying to pick this commit to 3.6.4-SNAPSHOT and run tests. However, I get a compile error when build with 3.6.4:

[ERROR] Bundle org.apache.zookeeper:zookeeper-jute:jar:3.6.4-SNAPSHOT : The default package '.' is not permitted by the Import-Package syntax.
 This can be caused by compile errors in Eclipse because Eclipse creates
valid class files regardless of compile errors.
The following package(s) import from the default package [org.apache.zookeeper.data, org.apache.zookeeper.proto, org.apache.zookeeper.server.persistence, org.apache.zookeeper.server.quorum, org.apache.zookeeper.txn]

I have no idea whether this failure comes from.

@kezhuw
Copy link
Member Author

kezhuw commented Jul 4, 2022

Hi @eolivelli, I have done a test based on #1859(which is a superset of this pr) and apache/curator#426 on branch-3.7. Follow is the verify progress copied from my shell. From my observation, there are flakes, but not failures.

PS: How the community share test report ? (I have deleted the test report from this reply as it is too long. You could see it from initial edit.)

@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from 1c5e5ee to 3aac293 Compare July 7, 2022 08:48
@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from 3aac293 to 602ff1d Compare July 20, 2022 00:07
@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from 602ff1d to e33fddf Compare October 10, 2022 01:43
@sonatype-lift
Copy link

sonatype-lift bot commented Oct 10, 2022

⚠️ 52 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

The semantics of persistent recursive watch promise no child events on
descendant nodes. When there are standard child watches on descendants
of node being watches in persistent recursive mode, server will deliver
child events to client inevitably. So we have to filter out child events
for persistent recursive watches on client side.
@kezhuw kezhuw force-pushed the ZOOKEEPER-4475-fix-recursive-watcher-NodeChildrenChanged branch from e33fddf to ff4750b Compare October 10, 2022 13:28
@kezhuw
Copy link
Member Author

kezhuw commented Oct 17, 2022

@kezhuw
Copy link
Member Author

kezhuw commented Feb 22, 2023

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have zookeeper-compatibility-tests-curator and I think it's OK to merge now.

@tisonkun
Copy link
Member

Pending to merge if no more objection this week.

@eolivelli
Copy link
Contributor

@tisonkun in ZooKeeper we do not allow to merge this kind of changes if there are not at least 2 sponsors.

Unfortunately currently there are not many active reviewers, but please always wait for a second approval before merging non trivial patches

@tisonkun
Copy link
Member

@eolivelli OK.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This is kind of a breaking change and we should deliver it only on 3.9.0 and not cherry-pick to 3.8.x

@eolivelli eolivelli merged commit 255b0c9 into apache:master Feb 22, 2023
@tisonkun
Copy link
Member

@eolivelli I agree that it changes the event delivered and am OK to deliver it on 3.9.0, while it can be regarded as a bug fix since in 3.6.0 release notes we write:

Note that NodeChildrenChanged events are not triggered for persistent recursive watches as it would be redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants